{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sM-NK2totNNz"
   },
   "source": [
    "# 第28章 生成对抗网络"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "w99sidXKUPeD"
   },
   "source": [
    "# 习题28.1\n",
    "\n",
    "&emsp;&emsp;GAN的生成网络的学习可以定义为以下的最小化问题：\n",
    "\n",
    "$$\n",
    "\\min \\limits_{\\theta} \\left \\{ E_{\\boldsymbol{z} \\sim P_{\\text {seed }}(\\boldsymbol{z})} [ \\log (1 - D( G(\\boldsymbol{z} ; \\boldsymbol{\\theta}) ; \\bar{\\boldsymbol{\\varphi}}))-\\log ( D( G(\\boldsymbol{z} ; \\boldsymbol{\\theta}) ; \\bar{\\boldsymbol{\\varphi}} )) ] \\} \\right.\n",
    "$$\n",
    "\n",
    "比较与式（28.2）的不同，并考虑其作用。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ml3h88oCBNzX"
   },
   "source": [
    "**解答：**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ml3h88oCBNzX"
   },
   "source": [
    "**解答思路：**\n",
    "\n",
    "1. 给出式(28.2)定义的生成网络学习的最小化目标函数\n",
    "2. 比较题中公式和式(28.2)定义的生成网络学习的最小化目标函数函数\n",
    "3. 分析题中的最小化问题的目标函数的作用"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ml3h88oCBNzX"
   },
   "source": [
    "**解答步骤：**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第1步：给出式(28.2)定义的生成网络学习的最小化目标函数**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;根据书中第505页的公式28.2：\n",
    "> &emsp;&emsp;假设已给训练数据$\\mathcal{D}$遵循分布$P_{\\text{data}} (\\boldsymbol{x})$，其中$\\boldsymbol{x}$是样本。生成网络用 $\\boldsymbol{x} = G(\\boldsymbol{z}; \\boldsymbol{\\theta})$表示，其中$\\boldsymbol{z}$是输入向量（种子），$\\boldsymbol{x}$ 是输出向量（生成数据），$\\boldsymbol{\\theta}$是网络参数。判别网络是一个二类分类器，用$P(1 | \\boldsymbol{x}) = D(\\boldsymbol{x}; \\boldsymbol{\\theta})$表示，其中$\\boldsymbol{x}$是输入向量，$P(1 | \\boldsymbol{x})$和$1 - P(1 | \\boldsymbol{x})$是输出概率，分别白哦是输入$\\boldsymbol{x}$来自训练数据和生成数据的概率，$\\varphi$是网络参数。种子$z$遵循分布$P_{\\text{seed}} (\\boldsymbol{z})$，生成网络生成的数据分布表示为$P_{\\text{gen}}(\\boldsymbol{x})$，由$P_{\\text{seed}}(\\boldsymbol{z})$和$\\boldsymbol{x} = G(\\boldsymbol{z}; \\boldsymbol{\\theta})$决定。    \n",
    "> &emsp;&emsp;如果判别网络参数 $\\varphi$ 固定，可以通过最小化以下目标函数学习生成网络参数θ。\n",
    "$$\n",
    "\\min \\limits_{\\theta} \\left \\{ E_{\\boldsymbol{z} \\sim P_{\\text {seed }}(\\boldsymbol{z})}[\\log (1 - D( G(\\boldsymbol{z} ; \\boldsymbol{\\theta}) ; \\bar{\\varphi})) \\} \\right.\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ml3h88oCBNzX"
   },
   "source": [
    "**第2步：比较题中公式和式(28.2)定义的生成网络学习的最小化目标函数**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;题中GAN的生成网络的学习可以定义为以下的最小化问题：\n",
    "$$\n",
    "\\min \\limits_{\\theta} \\left \\{ E_{\\boldsymbol{z} \\sim P_{\\text {seed }}(\\boldsymbol{z})} [ \\log (1 - D( G(\\boldsymbol{z} ; \\boldsymbol{\\theta}) ; \\bar{\\boldsymbol{\\varphi}})) - \\log ( D( G(\\boldsymbol{z} ; \\boldsymbol{\\theta}) ; \\bar{\\boldsymbol{\\varphi}} )) ] \\} \\right.\n",
    "$$\n",
    "与式（28.2）比较，式中多减去了一项：\n",
    "$$\n",
    "\\underset {\\theta}{min} \\left\\{E_{\\boldsymbol{z} \\sim P_{\\text {seed}} (\\boldsymbol{z}) } [\\log (D(G(\\boldsymbol{z} ; \\boldsymbol{\\theta}) ; \\bar{\\boldsymbol{\\varphi}})) \\} \\right.\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ml3h88oCBNzX"
   },
   "source": [
    "**第3步：分析题中的最小化问题的目标函数的作用**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;令$x = D(G(\\boldsymbol{z} ; \\boldsymbol{\\theta}) ; \\bar{\\boldsymbol{\\varphi}})$，则最小化问题可表示为\n",
    "$$\n",
    "\\min \\limits_{\\theta} \\left \\{ E_{\\boldsymbol{z} \\sim P_{\\text {seed }}(\\boldsymbol{z})} [ \\log (1-x) - \\log x] \\} \\right.\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ml3h88oCBNzX"
   },
   "source": [
    "&emsp;&emsp;题中最小化问题的目标函数的作用：\n",
    "\n",
    "1. 加速目标函数优化过程的收敛速度：对于$\\log (1-x) - \\log x$，其求导结果为$\\displaystyle -\\frac{1}{(1-x) x}$，可以有效防止$x$取值时导致的梯度减小而难以训练的情况。\n",
    "2. 防止出现在学习的初始情况，由于生成网络较弱，判别网络很容易区分生成数据和判别数据，导致生成网络的学习难以进行下去的情况。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "K84KhGANWRCU"
   },
   "source": [
    "# 习题28.2\n",
    "\n",
    "&emsp;&emsp;两个人进行零和博弈，参与人$X$和$Y$可选择的策略分别是$\\mathcal{X} = \\{1, 2 \\}$ 和 $\\mathcal{Y} = \\{1, 2\\}$。在博弈中，若参与人$X$和$Y$分别选择$i \\in \\mathcal{X}$和$j \\in \\mathcal{Y}$，则$X$的损失或$Y$的收益是$a_{ij}$。整体由矩阵$A=(a_{ij})$表示，矩阵$A$定义为:\n",
    "\n",
    "$$ \n",
    "A = \n",
    "\\left[ \\begin{array}{cc} \n",
    "-1 & 2 \\\\ \n",
    "4 & 1 \n",
    "\\end{array} \\right ] \n",
    "$$\n",
    "\n",
    "针对这个博弈求 $\\min_i \\max_j a_{ij}$和$\\max_j \\min_i a_{ij}$，并验证这时$\\max_j \\min_i a_{ij} \\leqslant \\min_i \\max_j a_{ij}$成立。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "L5wM7ZoM0pQ9"
   },
   "source": [
    "**解答：**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "L5wM7ZoM0pQ9"
   },
   "source": [
    "**解答思路：**\n",
    "\n",
    "1. 给出零和博弈的概念\n",
    "2. 结合零和博弈的概念，给出题中的求解方法\n",
    "3. 自编程对该博弈进行求解\n",
    "4. 验证这时$\\max_j \\min_i a_{ij}$和$\\min_i \\max_j a_{ij}$的关系"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "L5wM7ZoM0pQ9"
   },
   "source": [
    "**解答步骤：**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第1步：零和博弈的概念**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;根据维基百科中的零和博弈  \n",
    "（参考资料：https://zh.wikipedia.org/wiki/%E9%9B%B6%E5%92%8C%E5%8D%9A%E5%BC%88 ）\n",
    "> &emsp;&emsp;零和博弈，又称零和游戏或零和赛局（Zero-sum game）与非零和博弈相对，是博弈论的一个概念，属非合作博弈。零和博弈表示所有博弈方的利益之和为零或一个常数，即一方有所得，其他方必有所失。在零和博弈中，博弈各方是不合作的。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第2步：结合零和博弈的概念，给出题中的求解方法**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "L5wM7ZoM0pQ9"
   },
   "source": [
    "&emsp;&emsp;结合第1步给出的零和博弈概念，该博弈的收益矩阵为：\n",
    "$$ \n",
    "A = \n",
    "\\left[ \\begin{array}{cc} \n",
    "-1 & 2 \\\\ \n",
    "4 & 1 \n",
    "\\end{array} \\right ] \n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "L5wM7ZoM0pQ9"
   },
   "source": [
    "&emsp;&emsp;考虑最小最大化原则，即如果参与人$X$先选择策略，此时参与人$X$会选择对自身损失最小的策略，而参与人$Y$后选择策略，则会选择对自身收益最大的策略，即对应$\\max_j \\min_i a_{ij}$。  \n",
    "&emsp;&emsp;而对应计算过程，则是先按行计算每一行最小值，然后在每一行的最小值中选择最大值，得到$\\max_j \\min_i a_{ij}$。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "L5wM7ZoM0pQ9"
   },
   "source": [
    "&emsp;&emsp;考虑最大最小化原则，即如果参与人$Y$先选择策略，此时参与人$Y$会选择对自身收益最大的策略，而参与人$X$后选择策略，则会选择对自身损失最小的策略，即对应$\\min_i \\max_j a_{ij}$。  \n",
    "&emsp;&emsp;而对应计算过程，则是先按列计算每一行最大值，然后在每一行的最大值中选择最小值，得到$\\min_i \\max_j a_{ij}$。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "L5wM7ZoM0pQ9"
   },
   "source": [
    "**第3步：自编程对该博弈进行求解**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "L5wM7ZoM0pQ9"
   },
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "L5wM7ZoM0pQ9"
   },
   "outputs": [],
   "source": [
    "def minmax_function(A):\n",
    "    \"\"\"\n",
    "    从收益矩阵中计算minmax的算法\n",
    "    :param A: 收益矩阵\n",
    "    :return: 计算得到的minmax结果\n",
    "    \"\"\"\n",
    "    index_max = []\n",
    "    for i in range(len(A)):\n",
    "        # 计算每一行的最大值\n",
    "        index_max.append(A[i,:].max())\n",
    "    \n",
    "    # 计算每一行的最大值中的最小值\n",
    "    minmax = min(index_max)\n",
    "    return minmax"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "L5wM7ZoM0pQ9"
   },
   "outputs": [],
   "source": [
    "def maxmin_function(A):\n",
    "    \"\"\"\n",
    "    从收益矩阵中计算maxmin的算法\n",
    "    :param A: 收益矩阵\n",
    "    :return: 计算得到的maxmin结果\n",
    "    \"\"\"\n",
    "    column_min = []\n",
    "    for i in range(len(A)):\n",
    "        # 计算每一列的最小值\n",
    "        column_min.append(A[:,i].min())\n",
    "        \n",
    "    # 计算每一列的最小值中的最大值\n",
    "    maxmin = max(column_min)\n",
    "    return maxmin"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "L5wM7ZoM0pQ9"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "maxmin = 1\n",
      "minmax = 2\n"
     ]
    }
   ],
   "source": [
    "# 创建收益矩阵\n",
    "A = np.array([[-1,2],[4,1]])\n",
    "# 计算maxmin\n",
    "maxmin = maxmin_function(A)\n",
    "# 计算minmax\n",
    "minmax = minmax_function(A)\n",
    "# 输出结果\n",
    "print(\"maxmin =\", maxmin)\n",
    "print(\"minmax =\", minmax)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "L5wM7ZoM0pQ9"
   },
   "source": [
    "**第4步：验证这时$\\max_j \\min_i a_{ij}$和$\\min_i \\max_j a_{ij}$的关系**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "L5wM7ZoM0pQ9"
   },
   "source": [
    "&emsp;&emsp;由上步可得：\n",
    "$$\n",
    "\\max_j \\min_i a_{ij} = 1 \\\\\n",
    "\\min_i \\max_j a_{ij} = 2\n",
    "$$\n",
    "这时$\\max_j \\min_i a_{ij} \\leqslant \\min_i \\max_j a_{ij}$成立。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KyLGn0eWlRbC"
   },
   "source": [
    "# 习题28.3\n",
    "\n",
    "&emsp;&emsp;计算以下两个概率分布的Jessen-Shannon散度，设$0 log0 = 0$。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KyLGn0eWlRbC"
   },
   "source": [
    "|        |        |        |        |        |\n",
    "| :----: | :----: | :----: | :----: | :----: |\n",
    "| 0.1 | 0.7 | 0.1 | 0.1 | 0 |\n",
    "| 0.2 | 0   | 0   | 0.8 | 0 |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "aFcQ8vLsaZZW"
   },
   "source": [
    "**解答：**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "aFcQ8vLsaZZW"
   },
   "source": [
    "**解答思路：**\n",
    "\n",
    "1. 给出Jessen-Shannon散度的定义\n",
    "2. 写出题中数据的Jessen-Shannon散度数值计算过程\n",
    "2. 使用自编程实现并验证计算结果"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**解答步骤：**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第1步：Jessen-Shannon散度的定义**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;根据维基百科的Jessen-Shannon散度  \n",
    "（参考Wiki：https://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence ） \n",
    "> 给出两个概率分布$P$和$Q$，其Jessen-Shannon散度为：\n",
    "$$\n",
    "\\text{JS} (P \\| Q) = \\frac{1}{2} D(P \\| M) + \\frac{1}{2} D(Q \\| M)\n",
    "$$\n",
    "其中$\\displaystyle M = \\frac{1}{2} (P + Q)$，$D(\\cdot \\| \\cdot)$表示为KL散度。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;根据书中第537页附录E中KL散度的定义：\n",
    "> KL散度是描述两个概率分布$Q(x)$和$P(x)$相似度的一种度量，记作$D(Q \\| P)$。对离散随机变量，KL散度定义为\n",
    "> $$\n",
    "D(Q \\| P) = \\sum_i Q(i) \\log \\frac{Q(i)}{P(i)}\n",
    "$$\n",
    "对连续随机变量，KL散度定义为\n",
    "$$\n",
    "D(P \\| Q) = \\int Q(x) \\log \\frac{Q(x)}{P(x)} \\text{d} x\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第2步：写出题中数据的Jessen-Shannon散度数值计算过程**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;由于题中数据是离散值，采用对离散随机变量的KL散度公式进行计算，得到：\n",
    "\n",
    "$$ \n",
    "P = [0.1, 0.7, 0.1, 0.1, 0] \\\\\n",
    "Q = [0.2, 0, 0, 0.8, 0] \\\\\n",
    "M = \\frac{1}{2} (P + Q) = [0.15,0.35,0.05,0.45,0] \n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;则对应KL散度为：\n",
    "$$ \n",
    "D(P \\| M) = \\sum_{i} P(i) \\ln \\frac{P(i)}{M(i)} = \\ln 2 - 0.3 * \\ln 3 \\\\\n",
    "D(Q \\| M) = \\sum_{i} Q(i) \\ln \\frac{Q(i)}{M(i)} = 3.6 * \\ln 2 -1.8 * \\ln 3 \n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "计算JS散度，得到：\n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\text{JS} (P \\| Q) \n",
    "&= \\frac{1}{2} D(P \\| M) + \\frac{1}{2} D(Q \\| M) \\\\\n",
    "&= \\frac{1}{2}(4.6*\\ln2 - 2.1 * \\ln3 ) \\\\\n",
    "&= 2.3* \\ln 2 - 1.05 * \\ln 3 \\\\\n",
    "&= 0.440696 \n",
    "\\end{aligned}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第3步：使用自编程实现并计算结果**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "aFcQ8vLsaZZW"
   },
   "source": [
    "&emsp;&emsp;通过调用`scipy.stats`的`entropy`函数，根据题目中的两个概率分布进行计算，得到两个分布的Jessen-Shannon散度。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "aFcQ8vLsaZZW"
   },
   "outputs": [],
   "source": [
    "from scipy.stats import entropy\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "id": "aFcQ8vLsaZZW"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Jessen-Shannon Distance = 0.440696\n"
     ]
    }
   ],
   "source": [
    "# 加载数据\n",
    "P = [0.1, 0.7, 0.1, 0.1, 0]\n",
    "Q = [0.2, 0, 0, 0.8, 0]\n",
    "\n",
    "# 计算z=(x+y)/2\n",
    "M =[(P[i] + Q[i]) / 2 for i in range(min(len(P),len(Q)))]\n",
    "\n",
    "# 计算P和M之间的KL散度，Q和M之间的KL散度\n",
    "DL_P_M = entropy(P, M)\n",
    "DL_Q_M = entropy(Q, M)\n",
    "\n",
    "# 计算JS散度\n",
    "result = (DL_P_M + DL_Q_M) / 2\n",
    "\n",
    "# 输出结果\n",
    "print(\"Jessen-Shannon Distance = {:f}\".format(result))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "aFcQ8vLsaZZW"
   },
   "source": [
    "可得到两个概率分布的Jessen-Shannon散度为0.440696。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "vCMvFPZWxqoO"
   },
   "source": [
    "# 习题28.4\n",
    "\n",
    "&emsp;&emsp;证明两个概率分布 $P$ 和 $Q$ 之间的 Jessen-Shannon 散度满足以下关系，当且仅当 $P$ 和 $Q$ 相同时取最小值 0，设对数是自然对数。\n",
    "\n",
    "$$ \n",
    "0 \\leqslant \\text{JS} (P \\| Q) \\leqslant \\ln 2\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4mPldxbkYXhx"
   },
   "source": [
    "**解答：**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4mPldxbkYXhx"
   },
   "source": [
    "**解答思路：**\n",
    "\n",
    "1. 给出两个概率分布$P$和$Q$之间的Jessen-Shannon散度\n",
    "2. 证明当且仅当$P$和$Q$相同时，Jessen-Shannon散度取最小值0\n",
    "3. 证明Jessen-Shannon散度取最大值 $\\ln 2$（设对数是自然对数）\n",
    "4. 结合上述证明，得到 $\\text{JS}(P \\| Q)$ 关系式"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4mPldxbkYXhx"
   },
   "source": [
    "**解答步骤：**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第1步：两个概率分布 𝑃 和 𝑄 之间的Jessen-Shannon散度**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;根据维基百科的Jessen-Shannon散度  \n",
    "（参考Wiki：https://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence ） \n",
    "> 给出两个概率分布$P$和$Q$，其Jessen-Shannon散度为：\n",
    "$$\n",
    "\\text{JS} (P \\| Q) = \\frac{1}{2} D(P \\| M) + \\frac{1}{2} D(Q \\| M)\n",
    "$$\n",
    "其中$\\displaystyle M = \\frac{1}{2} (P + Q)$，$D(\\cdot \\| \\cdot)$表示为KL散度。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第2步：证明当且仅当 $P$ 和 $Q$ 相同时，Jessen-Shannon散度取最小值0**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4mPldxbkYXhx"
   },
   "source": [
    "&emsp;&emsp;首先，将两个概率分布$P$和$Q$的Jessen-Shannon散度展开为Kullback–Leibler散度形式：\n",
    "$$\n",
    "\\text{JS} (P \\| Q) = \\frac{1}{2} D(P \\| M) + \\frac{1}{2} D(Q \\| M)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;根据书中537页附录E\n",
    "> KL散度具有性质：$ D(P \\| Q) \\geqslant 0$。当且仅当 $Q = P$ 时，$D(P \\| Q) = 0$。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4mPldxbkYXhx"
   },
   "source": [
    "&emsp;&emsp;将上述KL散度的性质带入两个概率分布P和Q的Jessen-Shannon散度展开式可知：当且仅当$P$和$M$相同，且$Q$和$M$相同时，则有\n",
    "$$\n",
    "\\text{JS} (P \\| Q) = 0\n",
    "$$\n",
    "根据$M$的定义可知，则有\n",
    "$$\n",
    "P = Q = M = \\frac{1}{2}(P + Q)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4mPldxbkYXhx"
   },
   "source": [
    "&emsp;&emsp;综上所述，可证得当前仅当$P$和$Q$相同时，Jessen-Shannon散度取最小值0。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4mPldxbkYXhx"
   },
   "source": [
    "**第3步：证明Jessen-Shannon散度取最大值 $\\ln2$（设对数是自然对数）**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4mPldxbkYXhx"
   },
   "source": [
    "&emsp;&emsp;可知两个概率分布$P$和$Q$的Jessen-Shannon散度可表示为：\n",
    "$$\n",
    "\\text{JS}(P \\| Q) = \\frac{1}{2} D(P \\| M) + \\frac{1}{2} D(Q \\| M)\n",
    "$$<br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4mPldxbkYXhx"
   },
   "source": [
    "&emsp;&emsp;将连续随机变量的KL散度公式带入展开，并且将$\\displaystyle M = \\frac{1}{2} (P + Q)$代入："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4mPldxbkYXhx"
   },
   "source": [
    "$$\n",
    "\\begin{aligned}\n",
    "\\text{JS} (P \\| Q) \n",
    "&= \\frac{1}{2} \\int P(x) \\ln \\left( \\frac{P(x)}{\\displaystyle \\frac{ P(x) + Q(x)}{2}} \\right) \\text{d} x + \\frac{1}{2} \\int Q(x) \\ln \\left ( \\frac{Q(x)}{\\displaystyle \\frac{P(x) + Q(x)}{2}}\\right) \\text{d} x \\\\\n",
    "&= \\frac{1}{2} \\int P(x) \\ln \\left( \\frac{2 P(x)}{ P(x) + Q(x) } \\right) \\text{d} x + \\frac{1}{2} \\int Q(x) \\ln \\left( \\frac{2 Q(x)} {P(x) + Q(x)} \\right) \\text{d} x \\\\\n",
    "&= \\frac{1}{2} \\int \\left[ P(x) \\ln \\left( \\frac{P(x)}{P(x) + Q(x)} \\right) + Q(x) \\ln \\left( \\frac{Q(x)}{P(x) + Q(x)} \\right)  \\right ] \\text{d} x +\\ln2\n",
    "\\end{aligned}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4mPldxbkYXhx"
   },
   "source": [
    "可知：\n",
    "$$\n",
    "\\ln \\left(\\frac{P(x)}{P(x) + Q(x)} \\right) \\leqslant 0 \\\\\n",
    "\\ln \\left(\\frac{P(x)}{P(x) + Q(x)} \\right) \\leqslant 0\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4mPldxbkYXhx"
   },
   "source": [
    "可得：\n",
    "$$\n",
    "\\frac{1}{2} \\int \\left [ P(x) \\ln \\left( \\frac{P(x)}{P(x) + Q(x)} \\right) + Q(x) \\ln \\left( \\frac{Q(x)}{P(x) + Q(x)} \\right ) \\right] \\text{d} x \\leqslant0\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4mPldxbkYXhx"
   },
   "source": [
    "所以：\n",
    "$$\n",
    "\\text{JS} (P \\| Q) = \\frac{1}{2} \\int \\left[ P(x) \\ln \\left( \\frac{P(x)}{P(x) + Q(x)} \\right) + Q(x) \\ln \\left( \\frac{Q(x)}{P(x) + Q(x)} \\right)  \\right ] \\text{d} x + \\ln2 \\leqslant \\ln 2\n",
    "$$\n",
    "\n",
    "当且仅当概率分布$P$和$Q$完全不重叠时，Jessen-Shannon散度的最大值为$\\ln 2$。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第4步：结合上述证明，得到 $\\text{JS}(P \\| Q)$ 关系式**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;根据第2步和第3步，两个概率分布 $P$ 和 $Q$ 之间的 Jessen-Shannon 散度满足以下关系，当且仅当 $P$ 和 $Q$ 相同时取最小值 0，设对数是自然对数。\n",
    "\n",
    "$$ \n",
    "0 \\leqslant \\text{JS} (P \\| Q) \\leqslant \\ln 2\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "YRosWmpRzlo-"
   },
   "source": [
    "# 习题28.5\n",
    "\n",
    "&emsp;&emsp;考虑一维卷积运算，其输入是5维的向量 $\\boldsymbol{x}$，输出是3维向量 $\\boldsymbol{z}$。卷积核是 $\\boldsymbol{w} =(w_1, w_2 , w_3)$，步幅为1，填充为0。写出该卷积运算的矩阵表示，给出对应的转置卷积，并且验证原始卷积核 $\\boldsymbol{w}$ 和转置卷积核 $\\boldsymbol{w}'$ 之间有 $\\boldsymbol{w} = \\text{rot180} (\\boldsymbol{w}')$ 成立。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uv6msGEnwJkN"
   },
   "source": [
    "**解答：**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uv6msGEnwJkN"
   },
   "source": [
    "**解答思路：**\n",
    "\n",
    "1. 写出该卷积运算的矩阵表示\n",
    "2. 写出对应的转置卷积\n",
    "3. 证明原始卷积核 $\\boldsymbol{w}$ 和转置卷积核 $\\boldsymbol{w}'$ 满足 $\\boldsymbol{w} = \\text{rot180} (\\boldsymbol{w}')$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uv6msGEnwJkN"
   },
   "source": [
    "**解答步骤：**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第1步：写出该卷积运算的矩阵表示**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;假设输入的5维向量 $\\boldsymbol{x}$ 表示为$[ x_1, x_2, x_3, x_4, x_5 ]^T $，输出的3维向量 $\\boldsymbol{z}$ 表示为$[ z_1, z_2, z_3]^T $"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;根据书中第509页对卷积运算的矩阵表示，可构建矩阵$C$：\n",
    "$$\n",
    "C = \n",
    "\\left [ \\begin{array}{ccccc}\n",
    "w_1 & w_2 & w_3 & 0 & 0  \\\\ 0 & w_1 & w_2 & w_3 & 0  \\\\\n",
    " 0 & 0 &w_1 & w_2 & w_3\n",
    "\\end{array} \\right ]\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;根据书中第509页关于矩阵 $C$ 的线性变换描述：\n",
    "> &emsp;&emsp;考虑基于矩阵 $C$ 的线性变换，其输入是输入矩阵展开的向量，输出是输出矩阵展开的向量。这个线性变换对应神经网络前一层道后一层的信号传递（正向传播），而以上卷积运算表示在这个线性变换中。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可得卷积运算的矩阵表示：\n",
    "$$\n",
    "\\left [ \\begin{array}{ccccc}\n",
    "w_1 & w_2 & w_3 & 0 & 0 \\\\ \n",
    "0 & w_1 & w_2 & w_3 & 0 \\\\\n",
    "0 & 0 &w_1 & w_2 & w_3\n",
    "\\end{array} \\right ] \\cdot\n",
    "\\left [ \\begin{array}{c}\n",
    "x_1 \\\\ x_2 \\\\ x_3 \\\\ x_4 \\\\ x_5\n",
    "\\end{array} \\right ] = \n",
    "\\left [ \\begin{array}{c}\n",
    "z_1 \\\\ z_2 \\\\ z_3\n",
    "\\end{array} \\right ]\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第2步：写出对应的转置卷积**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;根据书中第509页对转置卷积的描述，可构建矩阵$C^T$\n",
    "$$\n",
    "C^T = \n",
    "\\left [ \\begin{array}{ccc}\n",
    "w_1 & 0 & 0  \\\\ \n",
    "w_2 & w_1 & 0 \\\\ \n",
    "w_3 & w_2 & w_1 \\\\ \n",
    "0 & w_3 & w_2 \\\\ \n",
    "0 & 0 & w_3\n",
    "\\end{array} \\right ]\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;根据书中第509页关于矩阵$C^T$的线性变换描述：\n",
    "> 考虑基于转置矩阵$C^T$的线性变换。这个线性变换对应神经网络后一层到前一层的信号传递（反向传播）。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可得对应的转置卷积为\n",
    "$$\n",
    "\\left [ \\begin{array}{ccc}\n",
    "w_1 & 0 & 0  \\\\ \n",
    "w_2 & w_1 & 0 \\\\ \n",
    "w_3 & w_2 & w_1 \\\\ \n",
    "0 & w_3 & w_2 \\\\ \n",
    "0 & 0 & w_3\n",
    "\\end{array} \\right ] \\cdot \n",
    "\\left [ \\begin{array}{c}\n",
    "z_1 \\\\ z_2 \\\\ z_3\n",
    "\\end{array} \\right ] = \n",
    "\\left [ \\begin{array}{c}\n",
    "x_1 \\\\ x_2 \\\\ x_3 \\\\ x_4 \\\\ x_5\n",
    "\\end{array} \\right ]\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uv6msGEnwJkN"
   },
   "source": [
    "这个转置卷积是核矩阵为$\\boldsymbol{w}' = (w_3 , w_2, w_1)$、填充为2、步幅为1的卷积运算。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第3步：证明原始卷积核 $\\boldsymbol{w}$ 和转置卷积核 $\\boldsymbol{w}'$ 满足 $\\boldsymbol{w} = \\text{rot180} (\\boldsymbol{w}')$**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uv6msGEnwJkN"
   },
   "source": [
    "&emsp;&emsp;因为原始卷积核$\\boldsymbol{w} = (w_1, w_2, w_3)$，而转置卷积核$\\boldsymbol{w}' = (w_3 , w_2, w_1)$，可得：\n",
    "$$ \n",
    "\\boldsymbol{w} = \\text{rot180} (\\boldsymbol{w}')\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cF04bO73cnqh"
   },
   "source": [
    "# 习题28.6\n",
    "\n",
    "&emsp;&emsp;写出图28.8中转置卷积的大小和原始卷积的大小之间的关系，转置卷积有输入矩阵尺寸 $\\hat{I}'$、卷积核尺寸 $K'$、步幅 $S'$、填充尺寸 $P'$、输出矩阵尺寸 $O'$。\n",
    "\n",
    "![28-6.png](../images/28-6.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "C7HkGerB4eSq"
   },
   "source": [
    "**解答：**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "C7HkGerB4eSq"
   },
   "source": [
    "**解答思路：**\n",
    "\n",
    "1. 给出转置卷积的大小计算\n",
    "2. 写出图中转置卷积的大小\n",
    "3. 写出图中原始卷积的大小\n",
    "4. 写出图中转置卷积的大小和原始卷积的大小之间的关系"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "C7HkGerB4eSq"
   },
   "source": [
    "**解答步骤：**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第1步：给出转置卷积的大小计算**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;根据书中第510页转置卷积的大小计算\n",
    "> 首先，计算原始卷积的大小。这里考虑简单的情况。假设输入矩阵是方阵，卷积核矩阵也是方阵。设 $I$ 是输入矩阵的尺寸，$K$ 是卷积核的尺寸，$P$ 是填充的尺寸，$S$ 是步幅。输出矩阵的尺寸 $O$ 满足\n",
    "> $$\n",
    "O = \\frac{I + 2P - K}{2} + 1 \\tag{28.13}\n",
    "$$\n",
    "> 这里考虑可以整除的情况，式（28.13）可以改为对应的形式：\n",
    "> $$\n",
    "I = \\frac{[O + (O - 1)(S - 1)] + 2(K - P - 1)- K}{1} + 1\n",
    "$$\n",
    "> 接着，计算转置卷积的大小。设 $I'$ 是输入矩阵的尺寸，$K'$ 是卷积核的尺寸，$P'$ 是填充的尺寸，$S'$ 是步幅。输出矩阵的尺寸$O'$ 满足\n",
    "> $$\n",
    "O' = \\frac{I' + 2P' - K'}{S} + 1\n",
    "$$\n",
    "> 这里也考虑可以整除的情况。转置卷积的输出矩阵尺寸 $O'$ 与原始卷积的输入矩阵尺寸 $I'$ 相同。因此，可以推算，当 $S = 1, P = 0 $时，转置卷积的大小和原始卷积的大小之间有以下关系：\n",
    "> $$\n",
    "I' = O, P' = K - 1, K' = K, S' = 1 \\\\\n",
    "O' = O + K - 1\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第2步：写出图中转置卷积的大小**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;根据图中的信息，转置卷积的输入矩阵（插入0向量后）尺寸为5，卷积核尺寸为3，步幅为1，填充尺寸为1， 输出矩阵尺寸为5，即 \n",
    "$\\hat{I}' = 5, K'=3, S' = 1, P' = 1, O' = 5$。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第3步：写出图中原始卷积的大小**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;根据图中的信息，原始卷积输入矩阵尺寸为5，卷积核尺寸为3，步幅为1，填充尺寸为0，输出矩阵尺寸为3，即\n",
    "$I = 5, K = 3, S = 1, P = 0, O = 3$。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第4步：写出图中转置卷积的大小和原始卷积的大小之间的关系**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;当 $S=1, P=0$ 时，转置卷积的大小和原始卷积的大小之间有以下关系成立：\n",
    "$$\n",
    "\\hat{I}' = O + (O-1) \\\\\n",
    "P' = K - 2 \\\\\n",
    "K' = K \\\\\n",
    "S' = 1 \\\\\n",
    "O' = O + K - 1\n",
    "$$"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [
    "KyLGn0eWlRbC",
    "vCMvFPZWxqoO"
   ],
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
